- 
                Notifications
    You must be signed in to change notification settings 
- Fork 118
Add comprehensive unit tests to cover edge cases for Delta log path parsing #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit introduces two new unit tests to enhance the robustness
of our Delta Lake log path parsing:
1. test_invalid_version_and_part_lengths:
   - Validates correct handling of invalid version lengths (VERSION_LEN)
   - Checks parsing of invalid multipart checkpoint lengths (MULTIPART_PART_LEN)
   - Ensures proper error handling for invalid UUID lengths (UUID_PART_LEN)
2. test_empty_filename:
   - Verifies that attempts to parse URLs ending with a slash (resulting
     in empty filenames) are correctly rejected
These tests ensure that our parsing logic correctly identifies and
rejects invalid log paths, maintaining the integrity of our Delta
Lake log processing.
    test_ok_none_and_unknown_cases: - Validates handling of non-numeric version prefixes (Ok(None) cases) - Verifies correct parsing of missing file extensions (Ok(None) cases) - Ensures proper handling of empty file extensions (Ok(Some) with Unknown type)
| Codecov ReportAttention: Patch coverage is  
 
 Additional details and impacted files@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
- Coverage   78.34%   77.86%   -0.48%     
==========================================
  Files          49       49              
  Lines       10282    10360      +78     
  Branches    10282    10360      +78     
==========================================
+ Hits         8055     8067      +12     
- Misses       1775     1833      +58     
- Partials      452      460       +8     ☔ View full report in Codecov by Sentry. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this!
This PR definitely adds some good missed cases, but I think most of the newly added cases were already covered by existing tests.
Suggest to add the missed cases to their respective existing tests (in the same style), rather than making entirely new tests. And maybe update the comments and expect_err calls of those existing cases, since they aren't as nicely described as what this PR does.
This commit enhances the Delta Lake log path parsing tests by adding explicit boundary test cases: 1. UUID length validation in test_uuid_checkpoint_patterns: - Tests UUID with exactly 35 characters (one too short) - Uses realistic UUID format: "3a0d65cd-4056-49b8-937b-95f9e3ee90e" - Makes length requirements clearer for future maintainers 2. File extension handling in test_unknown_invalid_patterns: - Tests file with no extension (returns Ok(None)) - Tests file with empty extension "." (returns Ok(Some) with Unknown type) - Clarifies the distinction between these two cases These boundary tests improve test coverage and make the requirements more explicit in the test suite, while maintaining the existing test structure and style.
c3653e5    to
    edbdf1f      
    Compare
  
    
This PR introduces new unit tests to validate the Delta Lake log path parsing logic corner cases thoroughly:
test_invalid_version_and_part_lengths:
test_empty_filename:
are correctly rejected
test_ok_none_and_unknown_cases:
These tests significantly improve coverage of edge cases and error
conditions in log path parsing.